Guided Project: Police killings

Posted on Wed 08 July 2015 in Projects

In [4]:
import pandas as pd
police_killings = pd.read_csv("police_killings.csv", encoding="ISO-8859-1")
police_killings.head(5)
Out[4]:
name age gender raceethnicity month day year streetaddress city state ... share_hispanic p_income h_income county_income comp_income county_bucket nat_bucket pov urate college
0 A'donte Washington 16 Male Black February 23 2015 Clearview Ln Millbrook AL ... 5.6 28375 51367 54766 0.937936 3 3 14.1 0.097686 0.168510
1 Aaron Rutledge 27 Male White April 2 2015 300 block Iris Park Dr Pineville LA ... 0.5 14678 27972 40930 0.683411 2 1 28.8 0.065724 0.111402
2 Aaron Siler 26 Male White March 14 2015 22nd Ave and 56th St Kenosha WI ... 16.8 25286 45365 54930 0.825869 2 3 14.6 0.166293 0.147312
3 Aaron Valdez 25 Male Hispanic/Latino March 11 2015 3000 Seminole Ave South Gate CA ... 98.8 17194 48295 55909 0.863814 3 3 11.7 0.124827 0.050133
4 Adam Jovicic 29 Male White March 19 2015 364 Hiwood Ave Munroe Falls OH ... 1.7 33954 68785 49669 1.384868 5 4 1.9 0.063550 0.403954

5 rows × 34 columns

In [2]:
police_killings.columns
Out[2]:
Index(['name', 'age', 'gender', 'raceethnicity', 'month', 'day', 'year',
       'streetaddress', 'city', 'state', 'latitude', 'longitude', 'state_fp',
       'county_fp', 'tract_ce', 'geo_id', 'county_id', 'namelsad',
       'lawenforcementagency', 'cause', 'armed', 'pop', 'share_white',
       'share_black', 'share_hispanic', 'p_income', 'h_income',
       'county_income', 'comp_income', 'county_bucket', 'nat_bucket', 'pov',
       'urate', 'college'],
      dtype='object')
In [10]:
counts = police_killings["raceethnicity"].value_counts()
Out[10]:
['White',
 'Black',
 'Hispanic/Latino',
 'Unknown',
 'Asian/Pacific Islander',
 'Native American']
In [14]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.bar(range(6), counts)
plt.xticks(range(6), counts.index, rotation="vertical")
Out[14]:
([<matplotlib.axis.XTick at 0x10800db70>,
  <matplotlib.axis.XTick at 0x10809f4e0>,
  <matplotlib.axis.XTick at 0x106a85748>,
  <matplotlib.axis.XTick at 0x106b67128>,
  <matplotlib.axis.XTick at 0x106b67b38>,
  <matplotlib.axis.XTick at 0x106b6a588>],
 <a list of 6 Text xticklabel objects>)
In [15]:
counts / sum(counts)
Out[15]:
White                     0.505353
Black                     0.289079
Hispanic/Latino           0.143469
Unknown                   0.032120
Asian/Pacific Islander    0.021413
Native American           0.008565
dtype: float64

Racial breakdown

It looks like people identified as Black are far overrepresented in the shootings versus in the population of the US (28% vs 16%). You can see the breakdown of population by race here.

People identified as Hispanic appear to be killed about as often as random chance would account for (14% of the people killed as Hispanic, versus 17% of the overall population).

Whites are underrepresented among shooting victims vs their population percentage, as are Asians.

In [22]:
police_killings["p_income"][police_killings["p_income"] != "-"].astype(float).hist(bins=20)
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x107f797f0>
In [25]:
police_killings["p_income"][police_killings["p_income"] != "-"].astype(float).median()
Out[25]:
22348.0

Income breakdown

According to the Census, median personal income in the US is 28,567, and our median is 22,348, which means that shootings tend to happen in less affluent areas. Our sample size is relatively small, though, so it's hard to make sweeping conclusions.

In [29]:
state_pop = pd.read_csv("state_population.csv")
In [38]:
counts = police_killings["state_fp"].value_counts()

states = pd.DataFrame({"STATE": counts.index, "shootings": counts})
In [41]:
states = states.merge(state_pop, on="STATE")
In [48]:
states["pop_millions"] = states["POPESTIMATE2015"] / 1000000
states["rate"] = states["shootings"] / states["pop_millions"]

states.sort("rate")
Out[48]:
STATE shootings SUMLEV REGION DIVISION NAME POPESTIMATE2015 POPEST18PLUS2015 PCNT_POPEST18PLUS rate pop_millions
43 9 1 40 1 1 Connecticut 3590886 2826827 78.7 0.278483 3.590886
22 42 7 40 1 2 Pennsylvania 12802503 10112229 79.0 0.546768 12.802503
38 19 2 40 2 4 Iowa 3123899 2395103 76.7 0.640226 3.123899
6 36 13 40 1 2 New York 19795791 15584974 78.7 0.656705 19.795791
29 25 5 40 1 1 Massachusetts 6794422 5407335 79.6 0.735898 6.794422
42 33 1 40 1 1 New Hampshire 1330608 1066610 80.2 0.751536 1.330608
45 23 1 40 1 1 Maine 1329328 1072948 80.7 0.752260 1.329328
11 17 11 40 2 3 Illinois 12859995 9901322 77.0 0.855366 12.859995
12 39 10 40 2 3 Ohio 11613423 8984946 77.4 0.861073 11.613423
31 55 5 40 2 3 Wisconsin 5771337 4476711 77.6 0.866350 5.771337
16 26 9 40 2 3 Michigan 9922576 7715272 77.8 0.907023 9.922576
28 47 6 40 3 6 Tennessee 6600299 5102688 77.3 0.909050 6.600299
15 37 10 40 3 5 North Carolina 10042802 7752234 77.2 0.995738 10.042802
36 32 3 40 4 8 Nevada 2890845 2221681 76.9 1.037759 2.890845
18 51 9 40 3 5 Virginia 8382993 6512571 77.7 1.073602 8.382993
40 54 2 40 3 5 West Virginia 1844128 1464532 79.4 1.084523 1.844128
25 27 6 40 2 4 Minnesota 5489594 4205207 76.6 1.092977 5.489594
20 18 8 40 2 3 Indiana 6619680 5040224 76.1 1.208518 6.619680
8 34 11 40 1 2 New Jersey 8958013 6959192 77.7 1.227951 8.958013
35 5 4 40 3 7 Arkansas 2978204 2272904 76.3 1.343091 2.978204
2 12 29 40 3 5 Florida 20271272 16166143 79.7 1.430596 20.271272
44 11 1 40 3 5 District of Columbia 672228 554121 82.4 1.487591 0.672228
9 53 11 40 4 9 Washington 7170351 5558509 77.5 1.534095 7.170351
5 13 16 40 3 5 Georgia 10214860 7710688 75.5 1.566346 10.214860
23 21 7 40 3 6 Kentucky 4425092 3413425 77.1 1.581888 4.425092
13 29 10 40 2 4 Missouri 6083672 4692196 77.1 1.643744 6.083672
21 1 8 40 3 6 Alabama 4858979 3755483 77.3 1.646436 4.858979
14 24 10 40 3 5 Maryland 6006401 4658175 77.6 1.664891 6.006401
30 49 5 40 4 8 Utah 2995919 2083423 69.5 1.668937 2.995919
46 56 1 40 4 8 Wyoming 586107 447212 76.3 1.706173 0.586107
1 48 47 40 3 7 Texas 27469114 20257343 73.7 1.711013 27.469114
17 45 9 40 3 5 South Carolina 4896146 3804558 77.7 1.838180 4.896146
0 6 74 40 4 9 California 39144818 30023902 76.7 1.890416 39.144818
37 30 2 40 4 8 Montana 1032949 806529 78.1 1.936204 1.032949
19 41 8 40 4 9 Oregon 4028977 3166121 78.6 1.985616 4.028977
26 28 6 40 3 6 Mississippi 2992333 2265485 75.7 2.005124 2.992333
24 20 6 40 2 4 Kansas 2911641 2192084 75.3 2.060694 2.911641
41 10 2 40 3 5 Delaware 945934 741548 78.4 2.114312 0.945934
7 8 12 40 4 8 Colorado 5456574 4199509 77.0 2.199182 5.456574
10 22 11 40 3 7 Louisiana 4670724 3555911 76.1 2.355095 4.670724
32 35 5 40 4 8 New Mexico 2085109 1588201 76.2 2.397956 2.085109
33 16 4 40 4 8 Idaho 1654930 1222093 73.8 2.417021 1.654930
39 2 2 40 4 9 Alaska 738432 552166 74.8 2.708442 0.738432
34 15 4 40 4 9 Hawaii 1431603 1120770 78.3 2.794071 1.431603
27 31 6 40 2 4 Nebraska 1896190 1425853 75.2 3.164240 1.896190
3 4 25 40 4 8 Arizona 6828065 5205215 76.2 3.661359 6.828065
4 40 22 40 3 7 Oklahoma 3911338 2950017 75.4 5.624674 3.911338

Killings by state

States in the midwest and south seem to have the highest police killing rates, whereas those in the northeast seem to have the lowest.

In [50]:
police_killings["state"].value_counts()
Out[50]:
CA    74
TX    46
FL    29
AZ    25
OK    22
GA    16
NY    14
CO    12
LA    11
WA    11
IL    11
NJ    11
MO    10
MD    10
OH    10
NC    10
VA     9
SC     9
MI     9
AL     8
OR     8
IN     8
PA     7
KY     7
MS     6
KS     6
NE     6
TN     6
MN     6
UT     5
MA     5
WI     5
NM     5
ID     4
HI     4
AR     4
NV     3
MT     2
AK     2
DE     2
WV     2
IA     2
DC     1
CT     1
NH     1
WY     1
ME     1
dtype: int64
In [66]:
pk = police_killings[
    (police_killings["share_white"] != "-") & 
    (police_killings["share_black"] != "-") & 
    (police_killings["share_hispanic"] != "-")
]

pk["share_white"] = pk["share_white"].astype(float)
pk["share_black"] = pk["share_black"].astype(float)
pk["share_hispanic"] = pk["share_hispanic"].astype(float)
/Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [67]:
lowest_states = ["CT", "PA", "IA", "NY", "MA", "NH", "ME", "IL", "OH", "WI"]
highest_states = ["OK", "AZ", "NE", "HI", "AK", "ID", "NM", "LA", "CO", "DE"]

ls = pk[pk["state"].isin(lowest_states)]
hs = pk[pk["state"].isin(highest_states)]
In [68]:
columns = ["pop", "county_income", "share_white", "share_black", "share_hispanic"]

ls[columns].mean()
Out[68]:
pop                4201.660714
county_income     54830.839286
share_white          60.616071
share_black          21.257143
share_hispanic       12.948214
dtype: float64
In [69]:
hs[columns].mean()
Out[69]:
pop                4315.750000
county_income     48706.967391
share_white          55.652174
share_black          11.532609
share_hispanic       20.693478
dtype: float64

State by state rates

It looks like the states with low rates of shootings tend to have a higher proportion of blacks in the population, and a lower proportion of hispanics in the census regions where the shootings occur. It looks like the income of the counties where the shootings occur is higher.

States with high rates of shootings tend to have high hispanic population shares in the counties where shootings occur.